Ref 

# 


Hits 


Search Query 


DBS 


Default 
Operator 


Plurals 


Time Stamp 


Ll 


937 


web adj crawl$ 


US-PGPUB; 

USPAT; 

IBM_TDB 


OR 


ON 


2005/12/19 13:40 


L2 


13 


1 with statistic$ 


US-PGPUB; 

USPAT; 

IBM_TDB 


OR 


ON 


2005/12/19 15:07 


L3 


44 


1 same statistic$ 


US-PGPUB; 

USPAT; 

IBM.TDB 


OR 


ON 


2005/12/19 15:07 


L4 


31 


3 not 2 


US-PGPUB; 
USPAT; 

IBM_TDd 


OR 


ON 


2005/12/19 15:08 


L5 


5 


( 6418433 J.URPN. 


UbPAl 


UK 


(JIM 


")r\nc/iT/in 
^UUb/lii/iy iD.l^ 


L6 


1 


( 5515259 ).URPN. 


1 ICD AT 

UbPAl 


UK 


UN 


2UUb/l^:/iy lb. 4b 


L7 


1 


("5671723").URPN. 


USPAT 


OR 


ON 


2005/12/19 16:47 


L8 


64 


(user adj defin$) with predicate 


USPAT 


OR 


ON 


2005/12/19 18:18 


L9 


2 


((user adj defin$) with predicate) 
and (statistic$ near information) 


USPAT 


OR 


ON 


2005/12/19 18:18 


LIO 


1 


"6343288". URPN. 


USPAT 


OR 


ON 


2005/12/19 18:18 


Lll 


2 


((user adj defin$) with predicate) 
and (web adj page) 


USPAT 


OR 


ON 


2005/12/19 18:18 


L12 


2 


((user adj defin$) with predicate) 
and ((web adj page) html) 


USPAT 


OR 


ON 


2005/12/19 18:18 


L13 


2 


((user adj defin$) with predicate) 
and ((webpage) html) 


USPAT 


OR 


ON 


2005/12/19 18:18 


L14 


2714 


predicate 


USPAT 


OR 


ON 


2005/12/19 18:18 


L15 


111 


predicate and (statistic$ near 
information) 


USPAT 


OR 


ON 


2005/12/19 18:18 


L16 


40 


(predicate and (statistic$ near 
information)) and (web adj page) 


USPAT 


OR 


ON 


2005/12/19 18:18 


L17 


36485 


((uniform adj resource adj locator) 
url) token 


USPAT 


OR 


ON 


2005/12/19 18:18 


L18 


114 


((uniform adj resource adj locator) 
url) with token 


USPAT 


OR 


ON 


2005/12/19 18:18 


Liy 


U 


(vpreoicaie ano ^SLaLiSLicip near 
information)) and (web adj page)) 
and (((uniform adj resource adj 
locator) url) with token) 


1 ICDAT 


UK 


UIn 


9001^/1 9/1 Q 1 P" 1 P 




U 


(preoicate ano (statistic? near 
information)) and (((uniform adj 
resource adj locator) url) with 
token) 


1 ICDAT 
UbrA 1 


UK 


UIN 


1C\C\^^^ ?/1 Q 1 1 Q 



Search History 12/19/05 6:23:39 PM Page 1 

C:\Documents and Settlngs\NHillery\My Documents\EAST\Workspaces\09703174 search. wsp 



L21 


7 


predicate and (((uniform adj 
resource adj locator) url) with 
token) 


USPAT 


OR 


ON 


2005/12/19 18:18 


L22 


39 


((predicate and (statistic$ near 
information)) and (web adj page)) 
and (retriev$ with (document page 
file)) 


1 ICDAT 


UK 


UIM 


<cUUb/l^/iy lo.lo 


L23 


38 


((predicate and (statistic$ near 
information)) and (web adj page)) 
and (retriev$ with (document)) 


USPAT 


OR 


ON 


2005/12/19 18:18 


L24 


3284 


(707/3).CCLS. 


USPAT; 
USOCR 


OR 


OFF 


2005/12/19 18:18 


L25 


1314 


(707/6).CCLS. 


USPAT; 
USOCR 


OR 


OFF 


2005/12/19 18:18 


L26 


578 


(715/530). CCLS. 


USPAT; 
USOCR 


OR 


OFF 


2005/12/19 18:18 


L27 


1121 


(715/513).CCLS. 


USPAT; 
USOCR 


OR 


OFF 


2005/12/19 18:18 


L28 


716 


(715/501. 1).CCLS. 


USPAT; 
USOCR 


OR 


OFF 


2005/12/19 18:18 


L29 


164 


(((715/530).CCLS.) ((715/513). 
CCLS.)) and (((707/3).CCLS.) 
((707/5).CCLS.)) 


USPAT 


OR 


ON 


2005/12/19 18:18 


L30 


31 


{((715/530).CCLS.)) and 
(((707/3).CCLS.)((707/6).CCLS.)) 


USPAT 


OR 


ON 


2005/12/19 18:18 


L31 


138 


(((715/513).CCLS.)) and 
(((707/3).CCLS.) ((707/6).CCLS.)) 


USPAT 


OR 


ON 


2005/12/19 18:18 


L32 


413 


(user adj defin$) with (query) 


USPAT 


OR 


ON 


2005/12/19 18:18 


L33 


4672 


statistic$ near information 


USPAT 


OR 


ON 


2005/12/19 18:18 


L34 


23 


((user adj defin$) with (query)) 
and (statistic$ near information) 


USPAT 


OR 


ON 


2005/12/19 18:18 


L35 


80 


((query and (statistic$ near 
information)) and (web adj page)) 
and (retriev$ with (document)) 


USPAT 


OR 


ON 


2005/12/19 18:18 


L36 


10 


(((user adj defin$) with (query)) 
and (statistic$ near information)) 
and ((web adj page) html 
webpage) 


1 ICDAT 

UbrA 1 


no 
UK 


PiM 
UlN 




L37 


5144 


search near3 (query term phrase) 


USPAT 


OR 


ON 


2005/12/19 18:18 


L38 


513 


(search near3 (query term 
phrase)) same (retriev$ with 
(document file webpage (web adj 
page) page)) 


1 ICDAT 
UbPA I 


UK 


UN 


innc/io/iQ iQ'iQ 
^uUj/iZ/iy io.io 


L39 


'5 "5 
233 

: 


((search near3 (query term 
phrase)) same (retriev$ with 
(document file webpage (web adj 
page) page))) and (statistic$) 


1 ICDAT 

UbrA 1 


UK 


UN 


onnc: /I *) /i Q iq-iq 
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L40 


782 


(URL (uniform adj resource adj 
locator)) near3 (token$ string$) 


USPAT 


OR 


ON 


2005/12/19 18:18 


L41 


9 


(((search near3 (query term 
phrase)) same (retriev$ with 
(Qocument nie weDpage ^weu aoj 
page) page))) and (statistic$)) and 
((URL (uniform adj resource adj 
locator)) near3 (token$ string$)) 


USPAT 


OR 


ON 


2005/12/19 18:18 


L42 


3006 


search near (query term phrase) 


USPAT 


OR 


ON 


2005/12/19 18:18 


L43 


59 


(search near (query term phrase)) 
same (retriev$ with (webpage 
(web adj page) HTML)) 


USPAT 


OR 


ON 


2005/12/19 18:18 


1 AA 


L 


((searcn near ^query term 
phrase)) same (retriev$ with 
(webpage (web adj page) HTML))) 
and (statistic$ near information) 


1 ICDAT 


no 

UK 


UIN 


<cUUD/iZ/iy lo.iO 


1 A C 

L45 


r 

b 


((search near (query term 
phrase)) same (retriev$ with 
(webpage (web adj page) HTML))) 
and (statistic$ with information) 


1 ICDAT 
UbrA 1 


UK 


UN 


zuuD/iz/iy lo.io 


L46 


1644 


(707/4).CCLS. 


USPAT; 
USOCR 


OR 


OFF 


2005/12/19 18:18 


L47 


9 


("5519709" 1 "5692176" | 
5717914 1 5737734 | 
"5926812" 1 "6212532" | 
"6272495" | "6353823" | 

6353377 ).PN. 


USPAT 


OR 


ON 


2005/12/19 18:18 


L48 


0 


"5533858". URPN. 


USPAT 


OR 


ON 


2005/12/19 18:18 


L49 


1120 


uri with (token$ string) 


USPAT 


OR 


ON 


2005/12/19 18:18 


L50 


120 


(uri with (token$ string)) with 
(predicate query) 


USPAT 


OR 


ON 


2005/12/19 18:18 


L51 


28 


((uri with (token$ string)) with 
(predicate query)) and (document 
with retriev$) 


USPAT 


OR 


ON 


2005/12/19 18:18 


L52 


11 


("5442784" | "5694594" | 
"5848407" 1 "5873081" | 

5937422 | 5940821 | 
"5941944" 1 "5953718" | 
"5963940" 1 "5991756" | 

6047126 ).PN. 


USPAT 


OR 


ON 


2005/12/19 18:18 


L53 


29 


"6112203".URPN. 


USPAT 


OR 


ON 


2005/12/19 18:18 


L54 


29 


"6112203".URPN. 


USPAT 


OR 


ON 


2005/12/19 18:18 


L55 


2962 


(web adj page) with (relat$ 
retriev$) 


USPAT 


OR 


ON 


2005/12/19 18:18 


L56 


712 


((web adj page) with (relat$ 
retriev$)) and (scor$5 relevanc$5 
rank$4) 


USPAT 


OR 


ON 


2005/12/19 18:18 
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L57 


100 


((web adj page) with (relat? 
retriev$)) same (scor$5 
relevanc$5 rank$4) 


USPAT 


OR 


ON 


2005/12/19 18:18 


L58 


84 


(((vyeb adj page) with (relat$ 
retriev$)) same (scor$5 
relevanc$5 rank$4)) and (content) 


USPAT 


OR 


ON 


2005/12/19 18:18 




"70 

/o 


((((weo aaj page; witn (reiat? 
retriev$)) same (scor$5 
relevanc$5 rank$4)) and 
(content)) and link$ 


1 ICDAT 

UbrA 1 


UK 


UIM 


ZUU-)/l^/iy ICJ.io 


LbU 


"70 

/o 


((((web adj page) with (relat$ 
retriev$)) same (scor$5 
relevanc$5 rank$4)) and 
(content)) and link$5 


1 ICDAT 

UbrA 1 


no 

UK 


UlN 


orinc /i *) /1 Q 1Q'1Q 
ZUUD/i//iy lo.lo 


L61 


7 


(((((web adj page) with (relat$ 
retriev$)) same (scor$5 
relevanc$5 rank$4)) and 
(content)) and link$) and (uri with 
(token$ string)) 


USPAT 


OR 


ON 


2005/12/19 18:18 


L62 


5 


"5418433". URPN. 


USPAT 


OR 


ON 


2005/12/19 18:18 


L63 


8 


( 5369577 | 55iOo5z | 
"5708829" 1 "5717912" | 
"5784608" 1 "5787417" | 
"5796952" j "5832494"). PN. 


1 ICDAT 

UbPA 1 


UK 


UlN 


2UUb/l<i/iy lo.lo 


L64 


191 


relevance adj feedback 


USPAT 


OR 


ON 


2005/12/19 18:18 


L65 


280 


relevance with feedback 


USPAT 


OR 


ON 


2005/12/19 18:18 


L66 


6 


(relevance with feedback) and 
((urI hyperlink) with (token$ 
string)) 


USPAT 


OR 


ON 


2005/12/19 18:18 


L67 


6 


(relevance adj feedback) and ((urI 
hyperlink) with (token$ string)) 


USPAT 


OR 


ON 


2005/12/19 18:18 
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^ Evaluatin g topic-driven web crawlers 

g;^ Filippo Menczer, Gautam Pant, Padmini Srinivasan, Miguel E. Ruiz 

September 2001 Proceedings of the 24th annual international ACM SIGIR conference 

on Research and development in information retrieval 
Publisher: ACM Press 

Additional Information: full citation , abstract , references , citin gs, index 
terms 



Full text available: pdf( 210.09 KB) 



Due to limited bandwidth, storage, and computational resources, and to the dynamic 
nature of the Web, search engines cannot index every Web page, and even the covered 
portion of the Web cannot be monitored continuously for changes. Therefore it is essential 
to develop effective crawling strategies to prioritize the pages to be indexed. The issue is 
even more important for topic-specific search engines, where crawlers must make 
additional decisions based on the relevance of visited pages. ... 

Keywords: InfoSpiders, PageRank, Web information retrieval, best-first search, focused 
crawlers, performance metrics, topic driven crawling 



2 On the desi g n of a learnin g crawler for topical resource discover y 
^ Charu C. Aggarwal, Fatlma Al-Garawi, Philip S. Yu 

^ July 2001 ACM Transactions on Information Systems (TOIS), volume i9 issue 3 
Publisher: ACM Press 

Full text available: pdf ( 324.39 KB ) Additional Information: full citation , abstract , references , index terms 

In recent years, the World Wide Web has shown enormous growth in size. Vast 
repositories of information are available on practically every possible topic. In such cases, 
it is valuable to perform topical resource discovery effectively. Consequently, several new 
ideas have been proposed in recent years; among them a key technique is focused 
crawling which is able to crawl particular topical portions of the World Wide Web quickly, 
without having to explore all web pages. In this paper, we propose ... 

Keywords: Crawling, World Wide Web 



Effective pag e refresh policies for Web crawlers | 
Junghoo Cho, Hector Garcia-Molina 

December 2003 ACM Transactions on Database Systems (TODS), volume 28 issue 4 
Publisher: ACM Press 

Full text available: pdf ( 345.52 KB ) Additional information: full cita tion , abstract , references , index terms 

In this article, we study how we can maintain local copies of remote data sources "fresh," 
when the source data is updated autonomously and independently. In particular, we study 
the problem of Web crawlers that maintain local copies of remote Web pages for Web 
search engines. In this context, remote data sources (Websites) do not notify the copies 
(Web crawlers) of new changes, so we need to periodically poll the sources to maintain 



the copies up-to-date. Since polling the sources ... 

Keywords: Web crawlers, page refresh, web search engines, world-wide web 



Intelli g ent crawlin g on the World Wide Web with arbitrar y predicates 
Charu C. Aggarwal, Fatima Al-Garawi, Philip S. Yu 

April 2001 Proceedings of the 10th international conference on World Wide Web 
Publisher: ACM Press 

Full text available: ^ pdf ( 272.60 KB ) Additional Information: full citation , references , citin gs, index terms 



Keywords: World Wide Web, crawling, querying 



Characterizin g a national connnnunit y web 
Daniel Gomes, Mario J. Silva 

August 2005 ACM Transactions on Internet Technology (TOIT), volume 5 issue 3 
Publisher: ACM Press 

Full text available: pdf f 364.77 KB) Additional Information: full citation , abstract , references , index terms 

This article presents a characterization of the community Web of the people of Portugal. 
We defined criteria for delimiting this Web based on our past experience of crawling pages 
related to Portugal and collected over 3.2 million documents from 46,000 sites satisfying 
those criteria. Our characterization was derived from this crawl. We describe the rules 
that we established for defining the boundaries of this community Web and the 
methodology used to gather statistics. Statistics cover the numb ... 

Keywords: Portuguese Web, Web characterization, Web communities, Web 
measurements 



Poster pa pers: Collaborative crawlin g : minin g user experiences for topical resource 

discover y 

Charu C. Aggarwal 

July 2002 Proceedings of the eighth ACM SIGKDD international conference on 

Knowledge discovery and data mining 
Publisher: ACM Press 

Full text available: ^ pdf (691.02 KB) Additional information: full citation , abstract , references , index terms 

The rapid growth of the world wide web had made the problem of topic specific resource 
discovery an important one in recent years. In this problem, it is desired to find web 
pages which satisfy a predicate specified by the user. Such a predicate could be a 
keyword query, a topical query, or some arbitrary contraint. Several techniques such as 
focussed crawling and intelligent crawling have recently been proposed for topic specific 
resource discovery. All these crawlers are linkage based, ... 

Tools & techniques track: searchin g and IR: Downloadin g textual hidden web content 

throug h keyword queries 

Alexandres Ntoulas, Petros Zerfos, Junghoo Cho 

June 2005 Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries 

Publisher: ACM Press 

Full text available: pdf (278.40 KB ) Additional Information: full citation , abstract , references , index terms 

An ever-Increasing amount of information on the Web today is available only through 
search interfaces: the users have to type in a set of keywords in a search form in order to 
access the pages from certain Web sites. These pages are often referred to as the Hidden 
Web or the Deep Web. Since there are no static links to the Hidden Web pages, search 
engines cannot discover and index such pages and thus do not return them in the results. 
However, according to recent studies, the conte ... 

Keywords: adaptive algorithm, deep web crawler, hidden web crawling, keyword 



queries, query selection 



A languag e and character set determination method based on N- g ram statistics 

Izumi Suzuki, Yoshiki Mikami, Ario Ohsato, Yoshihide Chubachi 

September 2002 ACM Transactions on Asian Language Information Processing 

(TALIP), Volume 1 Issue 3 

Publisher: ACM Press 

Full text available: ^ pdf(94.47 KB ) Additional Information; full citation , abstract , references , index terms 

An N-gram-based language, script, and encoding scheme-detection method is introduced 
in this article. The method detects language, script, and encoding schemes using a target 
text document encoded by computer by checking how many byte sequences of the target 
match the byte sequences that can appear in the texts belonging to a language, script, 
and encoding scheme. This detection mechanism is different from conventional N-gram- 
based methods in that its threshold for any category is uniquely prede ... 

Keywords: N-gram, Unicode, character set, corpus-based analysis, local language site, 
natural languages, text categorization 



Information retrieval on the web 
Mei Kobayashi, Koichi Takeda 

June 2000 ACM Computing Surveys (CSUR), volume 32 issue 2 
Publisher: ACM Press 

r- . !_> ^^/nHoon,/n\ Additional Information: full citation , abstract , references , citings, index 
Full text available: f?,! pdf( 213.89 KB ) 

^ terms 

In this paper we review studies of the growth of the Internet and technologies that are 
useful for information search and retrieval on the Web. We present data on the Internet 
from several different sources, e.g., current as well as projected number of users, hosts, 
and Web sites. Although numerical figures vary, overall trends cited by the sources are 
consistent and point to exponential growth in the past and in the coming decade. Hence it 
is not surprising that about 85% of Internet user ... 

Keywords: Internet, World Wide Web, clustering, indexing, information retrieval, 
knowledge management, search engine 



Estimatin g frequency of chan ge 
Junghoo Cho, Hector Garcia-Molina 

August 2003 ACM Transactions on Internet Technology (TOIT), volume 3 issue 3 
Publisher: ACM Press 

r- ^ ^x/^co ^yr^^ Additional Information: full citation , abstract , references , citings, index 

Full text available: T?j pdf f353.56 KB) ^ 

Many online data sources are updated autonomously and independently. In this article, 
we make the case for estimating the change frequency of data to improve Web crawlers, 
Web caches and to help data mining. We first identify various scenarios, where different 
applications have different requirements on the accuracy of the estimated frequency. 
Then we develop several "frequency estimators" for the identified scenarios, showing 
analytically and experimentally how precise they are. In many cases, ... 

Keywords: Change frequency estimation, Poisson process 



^ Learning probabilistic models of the Web (poster session ) 
4^ Thomas Hofmann 

^ July 2000 Proceedings of the 23rd annual international ACM SIGIR conference on 
Research and development in information retrieval 

Publisher: ACM Press 

Additional Information: full citation , a bstract , reference s, citings, index 



Full text available:'. .... ^ _ , 

terms 



In the World Wide Web, myriads of hyperlinks connect documents and pages to create an 
unprecedented, highly complex graph structure - the Web graph. This paper presents a 
novel approach to learning probabilistic models of the Web, which can be used to make 
reliable predictions about connectivity and information content of Web documents. The 
proposed method is a probabilistic dimension reduction technique which recasts and 
unites Latent Semantic Analysis and Kleinberg's Hubs-and-Authorities al ... 

2 Searchin g the Web 

t: August 2001 ACM Transactions on Internet Technology (TOIT), volume i issue i 
Publisher: ACM Press 

Additional Information: full citation , abstract , references , citin gs, index 



Full text available: ri pdf( 319.98 KB ) 

terms , review 

We offer an overview of current Web search engine design. After introducing a generic 
search engine architecture, we examine each engine component in turn. We cover 
crawling, local Web page storage, indexing, and the use of link analysis for boosting 
search performance. The most common design and implementation techniques for each of 
these components are presented. For this presentation we draw from the literature and 
from our own experimental search engine testbed. Emphasis is on introduci ... 

Keywords: HITS, PageRank, authorities, crawling, indexing, information retrieval, link 
analysis, search engine 



^ Buildin g a distributed full-text index for the web 

July 2001 ACM Transactions on Information Systems (TOIS), volume i9 issue 3 

Publisher: ACIVI Press 

Additional Information: full citation , abstract , references , index ternns . 



Full text available: TO pdf f651.72 KB ) 

review 

We identify crucial design issues in building a distributed inverted index for a large 
collection of Web pages. We introduce a novel pipelining technique for structuring the 
core index-building system that substantially reduces the index construction time. We also 
propose a storage scheme for creating and managing inverted files using an embedded 
database system. We suggest and compare different strategies for collecting global 
statistics from distributed inverted indexes. Finally, we present pe ... 

Keywords: Distributed indexing, Embedded databases, Inverted files. Pipelining, Text 
retrieval 



4 Web crawlin g and exploration: Probabilistic models for focused web crawlin g 
Hongyu Liu, Evangelos Milios, Jeannette Janssen 

November 2004 Proceedings of the 6th annual ACM international workshop on Web 
information and data management 

Publisher: ACM Press 

Full text available: pdf f 384.56 KB) Additional Information: full citation , abstract , references , index ternns 

A Focused crawler must use information gleaned from previously crawled page sequences 
to estimate the relevance of a newly seen URL. Therefore, good performance depends on 
powerful modelling of context as well as the current observations. Probabilistic models, 
such as Hidden Markov Models(HMMs) and Conditional Random Fields(CRFs), can 
potentially capture both formatting and context. In this paper, we present the use of HMM 
for focused web crawling, and compare it with Best-First strategy. Fur ... 

Keywords: conditional random fields, focused crawling, hidden Markov models, web 
graph, world wide web 



5 An adaptive mo del for op timizin g performance of an incremental v\/eb crawler 
Jenny Edwards, Kevin McCurley, John Tomlin 

April 2001 Proceedings of the 10th international conference on World Wide Web 
Publisher: ACM Press 



Full text available: W\ pdf(201.50 KB) Additional Information: full citation , references , citin gs, index terms 



Keywords: crawler, incremental crawler, optimization, scalability 



^ Short pa pers: Discovery of ads web hosts throu g h traffic data analysis 
v. Bacarella, F. Giannotti, M. Nanni, D. Pedreschi 

June 2004 Proceedings of the 9th ACM SIGMOD workshop on Research issues in data 
mining and knowledge discovery 

Publisher: ACM Press 

Full text available: 'p^_pdf f 189.30 KB) Additional Information: full citation , abstract , references 

One of the most actual problems on web crawling -- the most expensive task of any 
search engine, in terms of time and bandwidth consumption is the detection of useless 
segments of Internet. In some cases such segments are purposely created to deceive the 
crawling engine while, in others, they simply do not contain any useful information. 
Currently, the typical approach to the problem consists in using a human-compiled 
blacklist of sites to avoid (e.g., advertising sites and web counter ... 



^ Learnin g to crawl: Comparin g classification schemes 
Gautam Pant, Padmini Srinivasan 

October 2005 ACM Transactions on Information Systems (TOIS), volume 23 issue 4 
Publisher: ACM Press 

Full text available: ^ pdf f 940.75 KB ) Additional Information: full citation , abstract , references , index terms 

Topical crawling is a young and creative area of research that holds the promise of 
benefiting from several sophisticated data mining techniques. The use of classification 
algorithms to guide topical crawlers has been sporadically suggested in the literature. No 
systematic study, however, has been done on their relative merits. Using the lessons 
learned from our previous crawler evaluation studies, we experiment with multiple 
versions of different classification schemes. The crawling process Is ... 

Keywords: Topical crawlers, classifiers, focused crawlers, machine learning 



8 Data inte grit y: Web a p plication security assessment bv fault in j ection and behavior 
monitorin g 

Yao-Wen Huang, Shih-Kun Huang, Tsung-Po Lin, Chung-Hung Tsai 
May 2003 Proceedings of the 12th international conference on World Wide Web 

Publisher: ACM Press 

Additional Information: full citation , abstract , references , citings, index 



Full text available: W pdf f 4.53 MB) 

terms 

As a large and complex application platform, the V\/orld Wide Web Is capable of delivering 
a broad range of sophisticated applications. However, many Web applications go through 
rapid development phases with extremely short turnaround time, making it difficult to 
eliminate vulnerabilities. Here we analyze the design of Web application security 
assessment mechanisms In order to identify poor coding practices that render Web 
applications vulnerable to attacks such as SQL injection and cross-site scr ... 

Keywords: black-box testing, complete crawling, fault injection, security assessment, 
web application testing 



^ Web search 1: Topic-oriented collaborative crawlin g 
ifc^ Chlasen Chung, Charles L. A. Clarke 

^ November 2002 Proceedings of the eleventh international conference on Information 
and knowledge management 

Publisher: ACM Press 

Full text available: "P^ pdf ( 179.28 KB) Additional Information: full citation , abstract , references , index terms 



A major concern in the Implementation of a distributed Web crawler is the choice of a 



strategy for partitioning the Web among the nodes in the system. Our goal in selecting 
this strategy is to minimize the overlap between the activities of individual nodes. We 
propose a topic-oriented approach, in which the Web is partitioned into general subject 
areas with a crawler assigned to each. We examine design alternatives for a topic- 
oriented distributed crawler, including the creation of a Web page cl ... 

Keywords: distributed systems, text categorization, web crawling 



20 Org anizin g topic-specific web information 
^ Sougata Mukherjea 

^ May 2000 Proceedings of the eleventh ACM on Hypertext and hypermedia 
Publisher: ACM Press 
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